Deep-learning-based blood pressure estimation using multi channel photoplethysmogram and finger pressure with attention mechanism

Recently, several studies have proposed methods for measuring cuffless blood pressure (BP) using finger photoplethysmogram (PPG) signals. This study presents a new BP estimation system that measures PPG signals under progressive finger pressure, making the system relatively robust to errors caused by finger position when using the cuffless oscillometric method. To reduce errors caused by finger position, we developed a sensor that can simultaneously measure multi-channel PPG and force signals in a wide field of view (FOV). We propose a deep-learning-based algorithm that can learn to focus on the optimal PPG channel from multi channel PPG using an attention mechanism. The errors (ME ± STD) of the proposed multi channel system were 0.43±9.35 mmHg and 0.21 ± 7.72 mmHg for SBP and DBP, respectively. Through extensive experiments, we found a significant performance difference depending on the location of the PPG measurement in the BP estimation system using finger pressure.


Description of sensors and datasets
The oscillometric method for measuring BP uses the changes in PPG amplitudes caused by the occlusion of blood vessels. To accurately measure PPG signals at high signal-to-noise ratios, the optical path of the sensor must include blood vessels. However, because fingers have a complex vascular structure, there are small arteries on both sides of the finger and across the tip of the nail. Therefore, it is difficult to accurately form an optical path that includes blood vessels using a single-channel PPG sensor. To overcome this challenge, we developed a multi-channel PPG sensor with a wide field-of-view (FOV).
Sensors. The proposed PPG sensor comprises the following components, as shown in Fig. 1a: three green and three infra-red (IR) light emitting diodes (LEDs) (wavelengths of 535 nm and 850 nm, respectively) and nine photodetectors (PDs). The LED and PD for multi channel PPG operate according to the timing chart, as shown in Fig. 1b: three LEDs for row positions (top, middle, and bottom PDs), and three PDs for column positions (right, center, and left PDs). LEDs were placed on both sides of the PDs to account for the finger blood vessel structure. In addition, the FOV for the multi channel was 5 mm × 4.5 mm, and the overall size was 12 mm × 7.5 mm, enabling a finger to cover the entire sensor. We used this sensor to measure 9-channel PPG signals from an IR LED with a sampling rate of 43 Hz. Figure 2 shows the system setup for the experiment. The PPG sensor was configured as a button on the experimental support, and the commercial force sensor was located under the PPG sensor to detect the force signal exerted by the finger. Notably, the 9-channel PPG and force signals were synchronized, measured, and subsequently fed into the analog-to-digital converter of the built-in mainboard.

Data collection.
Clinical trials were conducted separately at two different sites: the MONIKI Hospital in Russia and Samsung Medical Center in Korea. Dataset1 was collected at the MONIKI Hospital and contained 1,450 cases from 290 participants; it was used to train the proposed BP estimation system. Dataset2 was collected at the Samsung Medical Center and contained 865 cases from 186 participants; it was used for training and testing. Each case included 40s synchronized 9-channel PPG and force signals. And reference BP obtained from two medical staff using auscultation was also included. Data collection was approved by the ethical committee of Samsung Medical Center (IRB Protocol No: 2020-06-065). The study design followed the International Standard (ISO 81060-2) 11 which was describing relevant guidelines for clinical investigation of Non-invasive Sphygmomanometers including subject requirements (minimum of 85 subjects and 255 BP values), reference  www.nature.com/scientificreports/ readings (mean value from 2 observers using double stethoscope). All examinees provided informed consent before the measurements were conducted. For clinical trials, the proposed multi-channel PPG system was used in compliance with standard protocols (ISO 81060-2). The reference BP was measured by two medical staff members using the auscultation method, and five measurements were conducted for each subject. The participants took a break of at least 5 min between measurements to ensure stability. The finger was then placed on a pre-marked guideline. After the measurement started, they were asked to gradually press the sensor with their index finger for 40 s while watching the pressure increase guide displayed on the computer screen. Table 1 lists the demographic information of the dataset.

Results
Setup. We used dataset1 of 290 participants and dataset2 of 186 participants for the training, validation, and testing of the BP estimation system. We divided the training, validation, and test data such that there were no overlapping participants. Dataset1 was used only for training and validation, whereas dataset2 was used only for training and testing. we split dataset1 with 183 participants for training and 107 participants for model validation. In addition, dataset2 was divided into file-folds without overlapping participants, one-fold was used for testing and the remaining folds were used for training, and each of the five-folds was tested in turn. Therefore, we performed model verification through a five-fold cross-validation of dataset2. All the five-fold BP estimation results obtained using dataset2 are presented in.
To train the nine single-channel CNN-based feature extractors and multi channel attention mechanism, the Adam optimizer 12 was used with β 1 = 0.9 and β 2 = 0.999 , a learning rate of 0.005, and a mini-batch size of 64. To improve the generalization of the proposed BP estimation system, a ℓ 2 regularization term with a scale of 0.005 and dropout rate of 0.3 were used. Detailed model parameters are summarized in Table 2.  www.nature.com/scientificreports/ Evaluation metrics. We used the mean of the error (ME), standard deviation of the error (STD), and Pearson's correlation coefficient (r) as the evaluation metrics for BP prediction. In addition to evaluating the overall BP estimation system, the BP estimation performance of the CNN-based feature extraction model for a single PPG and attention mechanism were compared and analyzed.
Validation of the single-channel BP estimation system. Most studies that have developed a BP prediction model use Physionet's multi-parameter intelligent monitoring in intensive care (MIMIC) online waveform database 13 or the University of Queensland Vital Signs dataset 14 . These public datasets contain unpressurized single-channel PPG and ECG signals. Although studies 15,16 have used self-made datasets, these usually contain unpressurized single-channel PPG signals or ECG signals. In this study, we designed a BP prediction system using the signals obtained from the proposed multi channel PPG and finger pressure sensors. Because our self-made dataset contains multi-channel PPG signals applied under pressure and finger pressure signals, a direct comparison with other BP estimation studies is difficult. Therefore, we analyzed the proposed multichannel PPG-based BP estimation system and its components. Tables 3 and 4 compare the SBP and DBP estimation performances for each of the nine PPG channels. In terms of the STD metric, the SBP and DBP estimation performances were the best when using the PPG signals of the 2nd and 3rd channels, respectively. In comparison, when the 7th and 6th channel PPG signals were used, the SBP and DBP estimation performances were the worst. There was a relative performance gap of approximately 9.6% in the SBP performance between the 2nd and 7th channels, and the DBP performance had a difference of approximately 3.7% between the 3rd and 6th channels. Although the multi-channel PPG signals were acquired simultaneously, the significant difference in BP prediction performance between the different channels could be attributed to the difference in the position of the finger placed on the PPG sensor for each user and the characteristic difference of fingers. Therefore, it can be stated that it is difficult to collect PPG signals consistently for all users through a single-channel PPG sensor.
Validation of the attention mechanism system. Table 5 compares the SBP and DBP estimation performances of one of the single-channel BP estimators and the proposed multi channel attention-based BP www.nature.com/scientificreports/ estimator. By combining multi channel features using the multi channel attention mechanism, the SBP estimation performance improved significantly compared to its best single-channel counterpart. More specifically, the SBP estimation performances of the single-channel and multi channel systems were 9.94 and 9.35, respectively, exhibiting a 6% relative improvement. As shown in Table 5, the DBP prediction performance also improved by 4.7% when the attention mechanism method using the multi-channel PPG signal was used. Furthermore, for both SBP and DBP estimation tasks, the Pearson correlation coefficient values improved five-fold. From the data acquisition point of view, 9 channels are used for data collection, but the sampling rate is 43 Hz, which is only a minor increase and can be handled sufficiently. On the other hand, in terms of performance, if the channel is selected incorrectly, a large error can occur (e.g., the SDE of Ch 7 is 10.99, showing a difference of 1.64 mmHg). Therefore, this algorithm is effectively shown to reduce the amount of error change due to channel selection and results suggest that the proposed attention mechanism that uses multi channel PPG signals is effective.  www.nature.com/scientificreports/ Analysis of attention weights. The attention mechanism of our proposed BP estimation system is important for improving the BP estimation performance. Figure 3 shows the attention weights of hypertension, hypotension, and normotensive data obtained for the SBP and DBP estimation tasks. The average value of the attention weights of each BP group was obtained and displayed as a bar graph. Interestingly, the attention weights of some specific channels were relatively larger than those of others in both the hypertension and hypotension data. In the SBP attention mechanism, hypertension and hypotension data exhibited large attention weights in the 2nd and 4th channels, respectively. Moreover, the channel with the largest attention weight in the hypertension data tended to have a relatively low attention weight in the hypotension data, and vice-versa. These trends were also observed for the attention weights of the DBP attention mechanism. Meanwhile, the attention weight of normotensive data revealed relatively evenly distributed attention weights for both SBP and DBP estimation tasks. These results suggest that our proposed multi channel PPG sensor with a multi channel attention mechanism can be effectively used to differentiate hypertensive and hypotensive users, thereby leading to improvements overproving superior to single-channel PPG-based BP estimation models.
Validation of the effectiveness of the proposed attention mechanism by changing the attention method. We verified that the proposed multi-channel attention mechanism could improve the BP prediction performance of single-channel models. The attention mechanism could predict BP more accurately by considering the importance of latent features extracted from multiple single-channel PPG signals. In this subsection, we confirm the effectiveness of the attention mechanism by changing its attention method. Table 6 compares the SBP and DBP performance when using part of the nine channel features instead of all of them.
In the learned attention mechanism, the top two and three best single-channel systems for the validation set performance were selected and applied to the test dataset. As shown in the table, although the average attention weights indicate the relative importance of the PPG channels, the hard selection of the two or three PPG chan-  Figure 3. This figure is a bar graph of attention weight. In each of the 8 bar graphs, the x-axis represents 9 channels, and the y-axis represents the probability values for the importance of the channels. It is divided into hypertensive, hypotensive, and normotensive data, and is presented by SBP, and DBP. www.nature.com/scientificreports/ nels with the largest attention weight values was not effective. In comparison, the proposed 9-channel attention mechanism significantly improves the hard-channel selection methods in Folds 2, 3, and 4. This indicates that by using the proposed attention mechanism, the adaptive combination of multi channel PPG signals per subject is effective.

BP estimation accuracy analysis according to input signal combination.
We compared the BP estimation accuracy according to the input signal combination in the single-channel BP estimation model for SBP and DBP, respectively. We compared the performance when only PPG, first and second differential signals, and finger pressure signals were used, the performance when envelope signals and finger pressure signals were used, and the performance when both were used. As shown in Table 7, in the SBP model, channel two was tested, and in the DBP model, channel three was tested. Performance was better for PPG signals than for envelope signals, and the best performance was obtained when all signals were used.

Analysis of scatter plot and Bland-Altman plot.
To further verify the proposed BP estimation system, the scatter plot and Bland-Altman plot 17 for SBP and DBP estimation are shown in Figs. 4 and 5 , respectively. As shown in Fig. 4, the proposed BP prediction system showed high Pearson's correlation coefficients of 0.86 and 0.8 for SBP and DBP, respectively. The Bland-Altman plot showed that most of the SBP and DBP data samples were within the limits of agreement.

Discussion
In this study, we developed a multi-channel PPG sensor that senses multi-channel PPG signals and finger pressure signals and proposed a cuffless BP estimation system. The acquired multi-channel PPG signals were obtained by placing a finger on the proposed sensor and gradually applying pressure. Using the developed sensor, dataset1 and dataset2 were collected from the MONIKI Hospital in Russia and the Samsung Medical Center in Korea, respectively. Dataset1 and dataset2, which contain 290 and 186 participants, respectively, are small datasets compared to the MIMIC online waveform database and University of Queensland Vital Signs datasets, which many researchers use to train their BP predictive models 18,19 . Because the size of the training dataset is known to substantially affect the performance of neural-network-based BP prediction models 20 , additional performance improvements can be expected by collecting additional training datasets using the developed sensor. However, clinical data collection is hampered by high cost, excessive time, and other limitations. Including inter-and intraindividual BP variation is important for the evaluation of cuffless devices, but difficult to obtain 21 . Our acquired dataset only acquired BP under static conditions and did not consider BP variation within each individual. Additionally, demographics (eg, age, gender) are often used as additional inputs to BP prediction models 21 .
However, our proposed model does not take advantage of this to relieve the hassle of requiring users to enter demographic information. In our study, dataset1 and dataset2 had slightly different conditions, such as the data acquisition environment, location, and some sensor specifications. As mentioned earlier, when the BP estimation system is trained using two datasets with different domains, it is difficult to expect a high accuracy in the target dataset 22 . The proposed BP estimation system was verified using five-cross-validation by setting dataset2 as the target dataset. we used regularization terms and dropout techniques to prevent model overfitting. and we showed the performance of the model through 5-fold cross validation. at this time, all performances for each 5-fold were shown. we applied the methods to prevent model overfitting with insufficient clinical data, as in other studies. If we apply  www.nature.com/scientificreports/ a domain adaptation technique 23 that can achieve better performance for the target dataset while reducing the interval between datasets in different domains 24 , we believe that we can obtain a more accurate estimated BP from dataset2. According to the Association for the Advancement of Medical Instrumentation (AAMI) standard 25 , the BP estimation error should be within 5±8mmHg. The proposed BP estimation system satisfied the AAMI criterion for DBP, and the SBP was also close to the AAMI criterion. The accuracy of our proposed BP estimation system can be improved if more data are collected, and the domain mismatch in the datasets collected in different environments between datasets owing to features of different domains are resolved. We plan to improve the deep-learning-based BP estimation system to obtain more accurate predicted BP values by applying domain adaptation techniques to our two datasets with different characteristics.

Method
We propose a novel BP prediction system with an attention mechanism that uses multi-channel PPG signals and a finger pressure signal. The proposed BP prediction system can learn to extract features from raw PPG and finger pressure signals using an end-to-end deep-learning method without relying on human-engineered, handcrafted feature extraction methods. Moreover, an attention mechanism allows the proposed system to effectively combine features extracted from each PPG channel.
Signal preprocessing and data preparation. Because the acquired raw PPG and finger pressure signals contained noise components, filtering was applied to remove noise. Specifically, to use various input signals for the feature extraction model, preprocessing steps were performed to obtain the filtered PPG signal envelope and differential signals (i.e., the first-and second-order temporal derivatives). Previous studies on BP prediction using PPG signals showed that using the first-and second-order differential signals of PPG in addition to the PPG signal, the BP prediction model can more accurately predict BP by modeling various information 26,27 . Therefore, we modeled the BP prediction system by adding the envelope signal and the first-and second-order differential signals. A block diagram of the signal preprocessing method is shown in detail in Fig. 7. To remove noise components, the raw PPG signal was passed through a band-pass filter with a cut-off frequency of 0.8-8 Hz for each multi channel signal. We also obtained the PPG envelope X e from the filtered PPG signal X p to provide various types of information to the CNN-based feature extractor. The PPG envelope was calculated through peak detection of the filtered PPG signal and interpolation. After obtaining the filtered PPG and PPG envelope signals, the first-and second-order derivative signals were obtained (△X p , △ 2 X p , △X e , and △ 2 X e ) to increase the diversity of the input as described above. The raw finger pressure signal was passed through a low-pass filter with a cut-off frequency of 0.2 Hz. We segmented the finger pressure signal from the maximum point of the PPG envelope signal to the left and right intervals of 5 seconds.
Through the signal preprocessing described above, we constructed a dataset (X 1 , X 2 , X 3 , and Y) to train the proposed BP prediction system, where X 1 (= X p ⊕ △X p ⊕ △ 2 X p ) is a concatenated input of PPG-related signals with dimensions of 1720× 3. In addition, X 2 (= X e ⊕ △X e ⊕ △ 2 X e ) is the concatenated input of PPG enveloperelated signals with dimensions of 1720× 3, and X 3 (= X f ) is a filtered finger pressure signal with dimensions of 215×1.
CNN-based feature extraction. The CNN successfully learns the relationship between neighboring data points through a convolution operation and can compress information from the input signal through a pooling layer 28 . Thus, we constructed three parallel input streams of the CNN model such that the features were extracted for each of the X 1 , X 2 , and X 3 inputs. The overall architecture of the proposed deep learning-based BP estimation system is shown in Fig. 6. The three input streams, expressed as C(·) : X i → Z i , where X and Z, denote the input and extracted features, respectively, and i ∈ {1, 2, 3} denotes the type of input feature. Each input stream first applies a convolution, batch normalization (BN), and the ReLU activation function, followed by a max-pooling operation. Subsequently, three CNN blocks, each with two repeats of convolution, BN, and ReLU with a residual connection, are stacked, and an average pooling layer aggregates the information from each feature stream. Finally, the features extracted from the three input streams are concatenated to form a single feature Z = Z 1 ⊕ Z 2 ⊕ Z 3 . The concatenated feature, Z, is then introduced to a fully connected layer with sigmoid nonlinearity, producing the final latent feature containing various types of information extracted from different input signals. Residual connections resolve the vanishing gradient problem 29 when training the feature extraction model. After the CNN-based feature extraction models are trained for each PPG channel, the attention mechanism can be trained to combine multi channel features for BP prediction.
To train the feature extraction model, the last output layer produces the estimated BP ŷ ∈ R . The model is then trained to minimize the mean squared error (MSE) between the reference and estimated BPs: where N denotes the number of samples; y i denotes the reference BP; and ŷ i denotes the estimated BP. To extract features from each of the 9-channel PPG signals, we trained nine feature extraction models for each PPG channel. The trained 9-channel feature extraction models can be expressed as f (·) : [X i 1 , X i 2 , X i 3 ] → Z i , where Z is the final latent feature, and i ∈ {1, 2, ..., 9} is the PPG channel index. Nine latent features Z ∈ R 16×9 with 16 dimensions were used as inputs to the multi channel attention model for the final BP prediction. The attention-based multi channel BP estimation performance is compared with the per-channel BP estimation performance in Section IV.
(1) www.nature.com/scientificreports/   www.nature.com/scientificreports/ Attention mechanism. Recently, attention mechanisms have proven effective in many fields, such as speech recognition 30,31 and natural language processing 32 . Attention mechanisms are neural networks that focus on important regions. We extracted each feature from a CNN-based feature extraction model using multi channel PPG and finger pressure signals. However, because the position of the finger placed on the proposed multi channel PPG sensor and the characteristics of the fingers may be different for each user, the importance of each channel for BP estimation may also differ for each user. Therefore, we applied an attention mechanism to the proposed BP estimation system for adaptively weighing channel-wise features according to their importance in estimating the BP for each user. As shown in Fig. 6, the extracted features, Z i , for the PPG channels, i ∈ {1, . . . , 9} , were introduced to the attention layer comprising a single-layer perceptron, s(·) : Z i → S i , to obtain a score, S i representing the importance of each channel. Score, S i , is obtained as follows: where ω and b are the trainable weights and biases, respectively. Subsequently, from the obtained score, S i , the attention weight, W i , was calculated using the softmax function to indicate the importance of each channel as a probability value. the attention weight, W i , was calculated as follows: The attention-weighted feature, Z ′ ∈ R 16×1 , was obtained through the weighted summation of the attention weight, W i , and the corresponding feature, Z i . The proposed BP estimation system produces an estimated BP through the output layer using the attention-weighted feature, Z ′ . To train the attention mechanism model, the MSE loss was calculated between the reference and estimated BPs ŷ obtained from the output layer of the attention mechanism. The proposed model was trained separately to minimize the MSE for systolic and diastolic BP.

Conclusion
In this study, we developed a multi-channel PPG sensor that acquires multi-channel PPG signals at different wavelengths. Moreover, we devised a deep-learning-based BP estimation system that can predict BP from multichannel PPG signals acquired from the proposed sensor and finger pressure signal. The proposed BP estimation system can extract features without human engineering and accurately predict the BP through an attention mechanism. Through attention weight analysis, we confirmed that the attention mechanism can improve the prediction performance of hypertension and hypotension. Because the proposed deep-learning-based BP estimation system is a cuff-free and calibration-free method, it is possible to monitor BP regularly and has the potential to diagnose hypertension at an early stage. The proposed BP estimation system can potentially enable regular BP monitoring of multiple users through mobile devices, such as smartphones or smart wristwatches.

Data availability
The data that support the findings of this study are available via e-mail from the corresponding author upon reasonable request. (2) S i = s(ωZ i + b), .